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Abstract 

Computer codes are widely used to describe physical processes in lieu of physical observations. 
In some cases, more than one computer simulator, each with different degrees of fidelity, can be 
used to explore the physical system. In this work, we combine field observations and model runs 
from deterministic multi-fidelity computer simulators to build a predictive model for the real pro- 
cess. The resulting model can be used to perform sensitivity analysis for the system, solve inverse 
problems and make predictions. Our approach is Bayesian and will be illustrated through a simple 
example, as well as a real application in predictive science at the Center for Radiative Shock Hy- 
drodynamics at the University of Michigan. 
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1 Introduction 

Deterministic computer models are used to simulate a wide variety of physical processes (Sacks et al., 
1989; Santner et al., 2003). Oftentimes, a single run of the code requires considerable computational 
effort, making it infeasible to continually exercise the simulator. Instead, experimenters attempt to 
explore the computer model response (and to some extent the physical process) using a limited number 
of computer model runs. 

In some applications, several simulators of the physical process are available to describe the same 
system (Craig et al., 1998; Kennedy and O'Hagan, 2000; Qian et al., 2006 and 2008; Reese et al., 2004; 
Cumming and Goldstein, 2009), each with different levels of fidelity. The varying levels of fidelity can 
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occur, for example, because of the presence of reduced order physics in lower fidelity models, different 
levels of accuracy specified for numerical solvers or solutions obtained on finer grids. In these cases, a 
higher fidelity model is thought to better represent the physical process than a lower fidelity model, 
but also takes more computer time to produce an output than a lower fidelity model. So, combining 
relatively cheap lower fidelity model runs with more costly high fidelity runs to emulate the high 
fidelity model has been an significant problem of interest (Kennedy and O'Hagan, 2000; Qian et al., 
2006 and 2008). 

Another important application of computer models is that of calibration (e.g., Kennedy and 
O'Hagan, 2001; Higdon et al., 2004) where the aim is to combine simulator outputs with physical 
observations to build a predictive model and also estimate unknown parameters that govern the be- 
haviour of the computer model. The latter endeavour amounts to solving a sort of inverse problem, 
while the former activity is a type of regression problem. 

Motivated by applications at the Center for Radiative Shock Hydrodynamics (CRASH) at the 
University of Michigan, the aim of this work is to develop new methodology to combine outputs from 
simulators with different levels of fidelity and field observations to make predictions of the physical 
system with associated measurements of uncertainty. In the spirit similar to Kennedy and O'Hagan 
(2000 and 2001) and Higdon et al. (2004), we propose a predictive model that incorporates computer 
model outputs and field data, while attempting to find optimal values for some input parameters 
(i.e. calibration parameters). Different models are specified for each source of data (Kennedy and 
O'Hagan, 2000; Qian et al., 2006 and 2008). The approach calibrates each computer model to the 
next highest level of fidelity model, and the simulator of the highest fidelity is then calibrated to 
the field measurements. All the response surfaces are Gaussian process (GP) models and the various 
sources of information that inform predictions of the physical system are combined with a Bayesian 
hierarchical model. 

The paper is organized as follows: In section [2j we will introduce the proposed methodology and 
the GP models involved, along with the relevant priors. The framework for prediction will be discussed 
at the end of the section. A simple example from the literature and an application from CRASH are 
used to demonstrate the proposed approach in Section [3} Further discussion follows in Section [4] with 
some concluding remarks in Section [5j 
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2 A Hierarchical Model for Multi-fidelity Model Calibration 



In this section, a Bayesian hierarchical model that calibrates multi-fidelity computer simulators is 
proposed. The higher fidelity code is assumed to better represent the real world process but is assumed 
to require more computing resources to simulate the system. For ease of presentation and notation, 
we present the case where there are only two computer simulators - a high fidelity and a low fidelity 
model. It is easy to extend the proposed methodology to cases with more than two simulators, and 
this setting is discussed in Section |4| 

2.1 The Hierarchical Model 

Throughout this work, the computer models are assumed to be deterministic mathematical functions 
that map inputs to outputs. The computer codes have two types of inputs: (i) design variables, x, that 
are adjustable or measurable in the field experiments; and (ii) calibration parameters, t, whose values 
are thought to impact the physical system, but are unknown a priori. The latter inputs can only be 
adjusted within the simulator, but are not measurable in the field. In model calibration problems, the 
issue is to build a predictive model for the field process and also to estimate the unknown calibration 
parameters. 

A unique feature of the application that motivated the current work is that the calibration pa- 
rameters for the computer models are not all the same. Some of the calibration parameters, tj, are 
shared among the simulators, whereas others are required inputs only to individual simulators. The 
vectors of calibration inputs exclusive to the high and low fidelity models are denoted as and t/, 
respectively. 

First consider the low fidelity computer model with inputs (x, tj,t/) (i.e., the design variables 
and calibration parameters that are shared and unshared with the high fidelity simulator), where 
x = (xi, . . . ,x p ), t f = . . . ,i/, m/ ) and t t = (t z>1 , . . . ,fy mj ). Outputs Y t (-) from the low fidelity 

simulator, rji(-), are written as: 

y,(x,t / ,t,) = »7i(x,t / ,t,). (i) 

Similarly, the high fidelity simulator, %(•), has inputs (x, t/,t^) 3 where = (£/i,l> * * • ^h,m h )^ an d 
output Y h {-)\ 

y /l (x,t / ,t /l ) = %(x,t/,tfc). 

Both simulators are used to describe the same physical process, but will not always give the same 
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response. There are a few obvious reasons why this is the case. The lower fidelity model is inferior 
to the high fidelity simulator since it may, for example, fail to capture some processes that the high 
fidelity code can more accurately model. Furthermore, the two codes do not share all of the same 
inputs. The inputs only appear in the high fidelity model and thus, any impact that these variables 
have on the output cannot be captured by the low fidelity model. Similarly, the inputs t/ appear only 
in the low fidelity model. To address these issues, we take the approach of writing the high fidelity 
simulator as a discrepancy adjusted version of the low fidelity model (e.g. Kennedy and O'Hagan, 
2000; Qian et al., 2006): 

Y h (x, t /? t h ) = r?/(x, t f , Oi) + <S 2 (x, t/, t/0. (2) 

Specifying the first term in ^ as ^(x, amounts to partially calibrating (partially in the 
sense that the other calibration parameters must still be estimated) the first computer model to the 
second computer model. Furthermore, the discrepancy, ^(O? represents the systematic differences 
between the partially calibrated low fidelity model and the high fidelity code. Lastly, notice that ^(O 
is a function of not only the design variables - as in Kennedy and O'Hagan (2001) - but also (t/,t^). 
The calibration parameters are included in this discrepancy term because they can be modified in the 
high fidelity code. Therefore, this discrepancy term captures the systematic differences in the outputs 
from the two models over values of the design variables and the changes in the calibration inputs tf 
and th- 
in addition to the simulations, there are also field observations that are used to inform predic- 
tions. Since the higher fidelity simulator is assumed to better represent the physical process than 
the low fidelity simulator, it is natural to model the field observations with the simulator of higher 
fidelity. Similar to Kennedy and O'Hagan (2001), a discrepancy function, £/(•), is used to capture the 
systematic inadequacy of the high fidelity simulator. The field observations are noisy versions of the 
mean process, and independent and identically distributed (iid) observational errors are included in 
our specification. For input setting, x, the field process is written as: 

Y f (x) = r} h (x,0f,0 h ) + 6 f (x) + e, (3) 

where e ~ iV(0, 1/A y ). Substituting ^ into ^ allows the field observations to be written: 

Y f (x) = rn{x.,O f , Oi) + <5 2 (x, f , h ) + S f ( X ) + e, (4) 

So, the response surface for the field data is written as the sum of the calibrated low fidelity 
simulator, the calibrated discrepancy between the two different simulators, the discrepancy between 
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the high fidelity model and the data, and observational error. From here on out, we describe the 
response surfaces for the low and high fidelity simulators and the field data using the framework 



It is possible at this point to envision applications with more than two simulators, each ranked 
from low to high by levels of fidelity. The above framework can be extended to these cases. For 
example, in the case where there are three simulators of different fidelity, 771 (•), 772 (•) and 773 (•), where 
771 (•) is of the lowest fidelity and 773 (•) is best at describing the physical process. Outputs from 771 (•) 
and 772 (•) are modelled as above. Similarly, 773 (•) will be modelled using 771 (•) and some discrepancy 
functions, and the field observations will be modelled with the highest fidelity simulator 773 ( - ) . More 
on this in Section HJ 

2.2 Gaussian Process Models 

To make predictions of the physical system, response surfaces for the computer model and discrepancies 
must be estimated. We follow the common practice of emulating simulator responses using a GP (e.g., 
see Sacks et al., 1989). The reason for this, in general, boils down to the success of the GP as a non- 
parametric regression estimator and also the ability of the GP model to provide a basis for statistical 
inference for the outputs of deterministic computer codes. From a Bayesian viewpoint in this context, 
one can think of the GP as a prior distribution over the class of functions produced by the low fidelity 
simulator and the discrepancies, respectively. 

We begin by first considering the specification for the low fidelity simulator. The outputs are 
treated as a realization of a random function of the form: 



where /1, . . . , f p are regression functions, (3 = . . . , (3 p ) f is the vector of unknown regression coeffi- 
cients, and Z is a mean zero GP. We follow the convention of most computer model applications by 
specifying the mean function as a constant, /x, and model the response surface through the covariance 
structure. The covariance between observations at inputs (x^t^,t^) and (xj, tfj, tij) is specified as 



described in equations ([!]), Q and Q, respectively. 



p 



Yi (x, t /5 ti) = fi ( x > */> *0 h + z ( x > */> *0 > 



1=1 



Cov [Z (xi, t/,*, t M ) , Z (xj,t fij ,tij 




P 171 f / x2 m l 



4{tl,i,s—tl,j,s) 



rji 



■S = l .s = l .s = l 



(5) 
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where A^ is the marginal precision of the simulator r\i. The (p + rrif + m^)-vector p = (p m ,i, • • • , 
Prj hP , • • • , p r]u p+ rnf1 . . . , p r]up j rrnf j rrni ) is the vector of correlation parameters that govern the dependence 
in each of the component directions of x, tf and t; (e.g., Sacks et al., 1989; Higdon et al., 2004 and 
2008; Linkletter et al., 2006). 

The discrepancy, ^(O? captures the systematic differences between the high and low fidelity simu- 
lators as a function of the inputs, (x, t^, t/), that are adjustable in the high fidelity model. Continuing 
as above, ^(O is modelled as mean zero GP with covariance: 



rw;z(x,i,,i,,.,).z ( x,,t,,,t / ,,); = f U^ s ~ X3 ' sf U^'^ U^ f h r 



,2 



.5 — 1 S=l ,5 — 1 

= J^R((^i^f,i^h,i) , (Xj^fj^hj) ,P2) ^ ( 6 ) 

where the marginal precision of the discrepancy function is A2, and the vector of correlation parameters 

for the inputs is p 2 = (p2,l, • • • , P2,p, • • • , P2,p+ra /5 • • • , P2,p+ra / +raJ. 

The function £/(•) is the discrepancy between the response from high fidelity simulator and the 
physical process. Again, a zero mean GP is chosen. Let Xf denote the marginal precision of the 
discrepancy function, £/(•), and the vector pj — (p/,i, ■ ■ ■ P/,p) be the correlation parameters for the p 
design variables. The covariance function is written as 



1 P 

Cov[Z{^i),Z{^j)} = T-Yl p t 



X f S=l 

^-R((^^j),p f ). (7) 

T 



We define the vector of all observations and simulation outputs as Y = \Yj \Y^,Yfj , where 
Y f = (lj(xi), . . . , lj(x n/ )) T is the vector of rif field measurements, and the vectors and n\ simu- 
lated runs from the high and low fidelity simulators are = ^Y^x^, l5 t' h 1 ), . . . , Y/^x^, n ^ 5 t' h nh )^j 

and Y/ = (^(xj, t} >1? t^), . . . , ^(x*,, t* ^, t*^)) , respectively. To simplify notation, denote = 
(0f,0 h ,0i), A = (A^, A 2 , A/) and p = (p^, p 2 , pj). The likelihood for Y is 



L (Y|0, /x, A, p) oc |E Y |- 3 exp {(Y - /x) T S Y X (Y - M )} , 
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where p is the constant mean vector and 



S 2 
= Sib + I ~ ! + 




^ £/ + £ y ^ 

0. (8) 
^ j 

The covariance matrix T, m is the covariance between all the outputs and is obtained by applying 
equation ^ to each pair of the (rif + + n{) input settings of the observed and simulated data. 
Similarly, the covariance matrix £2 describes the relationship of the systematic difference between the 
two simulators and hence, is obtained by applying equation ^ to each pair of the rif input settings of 
the observed data Yf and input settings of the simulated data Y^. It is a square matrix of order 
(rif + rih). Equation Q is only applied to each pair of the rif input settings for the covariance matrix 
T,f. The covariance matrix for the measurement error (e) is given by the rif x rif diagonal matrix 
Tjy — (l/X y )I nf and X y is the precision parameter of the observational error. 

2.2.1 Priors for the GPs and MCMC 

The posterior distribution of calibration and statistical model parameters, (0, /i, A, p), takes the form 

7r (0, p, A, p| Y) oc L (Y|0, p, A, p) x tt (6) x tt (fi) x tt (A) x tt (p) , (9) 
where we abuse notation for prior distributions and denote the prior for 0, A and p as tv (0) = 

mf m h mi p+m f +mi 

Y[k (0f,i)xY[7V (0 h ,i)xY[7V (0l,i), 7V (\) = TT (X m )X7T (X 2 )X7T (Xf)XTT (X y ) and TV (p) = Y[ TV (p m j ) \ 

i=l i=l i=l i=l 

p+m f +m h p 

Yl 71 (P2,i) x II (p/,i), respectively. 

i=i i=i 

In practice, we have to estimate 6, A and p. We use Markov chain Monte Carlo (MCMC) to 
sample from the posterior distribution of the parameters, given observations and simulations. In order 
to simplify the prior specification, the responses are standardized to have mean zero and variance 1 
(e.g., Linkletter et al., 2006). Hence, the prior for the precision of the marginal variance, A^, is chosen 
to encourage its values to be close to 1 - the idea being that the low fidelity model should capture 
much of the signal in the observations. We use a Gamma distribution for the prior for X m : 

tv(X Vi ) oc X% 1 exp{-b m X m }. 
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When expert knowledge is unavailable, we have found that a m = b m = 5 (Higdon et al., 2004) works 
reasonably well as the choice centers the prior distribution at 1 with a reasonably large variance, 
thereby allowing for a fairly broad exploration of the posterior. Similarly, the priors chosen for the 
remaining precision parameters are also Gamma distributions. We also use the default values proposed 
by Higdon et al. (2004) for the hyperparameters of priors for the remaining precision parameters. The 
default choice of shape and scale parameters are a* = 1 and 6* = 0.001. This specification implies a 
relatively uninformative prior for these precision parameters, which encourages the data to choose a 
suitable value by itself. 

The components in p are bounded within the unit interval. Hence, a natural choice of prior for 
p* E p is the Beta distribution of the following form: 

7r(p*)«(pT*- 1 (l-P*)*'" 1 - 
Conventionally, the Beta priors are flat and centered at 1 with small variance (Williams et al., 2006). 
This is based on the prior belief that all the inputs are equally uncorrelated to the simulator and 
allow the data to decide the dependence of the simulator on the different inputs by moving the p's 
away from 1 in the posterior. In our experience, the default choices for these parameters, a* = 1 and 
6* = 0.001, suggested by Higdon et al. (2004) and Williams et al. (2006) encourage strong enough 
dependence in each of the parameters and work well in general. 

The posterior distribution for each parameter is explored using MCMC. Specifically, single-site 
Metropolis updates (Metropolis et al., 1953) are used for the components of p and 6. Proposals are 
made for each of these parameters from a uniform distribution centered at the parameter's current 
value. The widths of the uniform distributions (one for each component parameter) are pre-computed 
by running short MCMC runs and choosing a width that gives an acceptance rate of about 0.44 
(Gelman et al., 2004). Although this adjustment does not guarantee an acceptance probability of 
0.44, we have found this procedure is helpful at choosing widths resulting in acceptance ratios between 
0.25 and 0.75 and, more importantly, encourages the MCMC to converge. Good default choices for the 
widths for the updates can also be found using the method proposed by Graves (2005). For each of the 
precision parameters, we used Hastings updates (Hastings, 1970), where the proposed value is drawn 
from a uniform distribution centered at the current parameter value, with a width that is proportional 
to the parameter's current value. We have found that a width that is 0.3 times the current parameter 
value (proposed by Higdon et al., 2008) works fairly well in general. 
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2.3 Prediction 

The main goal of this endeavour is prediction. Given the posterior realizations from ([9]), predictions 
of the field measurement, lj(x new ), can be made at a new input setting x. new . The joint distribution 
between Y and lj(x new ), conditional on the parameters 0, A and p, is: 



MVN (0,2* 



Y 

Y f (x. new 

where the covariance matrix, J] new 5 is analogous to the covariance in (|8j) - there is an extra row and 
column in E 726 ™ as a result of appending Yf(x. new ) to Y. 

Through the usual properties of the multivariate normal distribution, the predictive distribution 
of Yf(x. new ), conditional on the parameters, is: 

Y f (x new ) | Y ~ MVN (fi pred , Z pred ) , (10) 

where fi pred = E^f (EJf ) _1 Y and Y> pred = Egf™ - Y%f w (Syf)" 1 Eyf. The matrices Ef/™ are 
sub-matrices of Y, new where 

^11 ^12 
^21 ^22 

The sub-matrix E^f™ is an (nj + + n{) x (ny + + nf) matrix, while E^f™ and E^f™ are of di- 
mension (rif + rih + n{) x 1 and 1 x (rif + + n/), respectively. The remaining sub-matrix, Srjf™, is 
a scalar. 

To make predictions, we first sample a vector of parameters from Q. Next, conditional on the 



sampled parameters, a prediction is sampled from (10). The sampling of parameters and predictions 



are repeated many times to provide estimated posterior quantities (e.g., posterior mean, variance or 
prediction intervals). 

3 Examples 

In this section, two examples are presented. The first example is a simple computer model that is 
used to demonstrate the proposed approach. After illustrating the implementation of our approach 
and some diagnostics to assess the adequacy of the model fit, a small simulation study is carried out 
to investigate the predictive performance of the proposed methodology. The second example is the 
application that motivated this work, and involves a radiative shock experiment conducted at CRASH. 



The main goal is to predict the observed field measurements given the outputs from two computer 
models and some field trials. 



3.1 Toy Example 

We begin with the "toy" example in Bastos and O'Hagan (2009), with some slight alterations. That 
is, the setting has been modified to accommodate two computer models and field experiments. In 
addition, we refashion the computer model to include two design variables, a common calibration 
parameter and calibration parameters that exist in each computer model, respectively. For simplicity, 
all the input settings and calibration parameters are chosen from the unit interval. 
We specify the low fidelity model as: 

yi(x,t f ,ti) = rji(-x,t f ,ti) 

f 1 \ 1000^? + 1900xf + 2092xi + 60 

x 1 1 J lOOO^x? + 500xf + 4xi + 20 



The high fidelity model is defined as the low fidelity response model plus a discrepancy term: 

x th 

y h (-x,t f ,t h ) = rji (x,t/,^) + 5exp(-t/) ^-^ r- 

100(x^ + lJ 

= rn(x,tf,ei) + 5 c (x,t f ,t h ). (12) 

To illustrate the proposed approach, we simulate outputs from the respective models. Following 
Loeppky et al. (2009), we used a 40 run random Latin hypercube design (Mackay et al., 1979) for the 
low fidelity simulator. Since, in practice, the high fidelity model is likely to be more computationally 
expensive than the low fidelity model, only 10 runs are generated - also chosen using a random Latin 
hypercube design. 

In most computer model applications, there are relatively few field observations. Consequently to 
mimic this setting, only 3 field observations were simulated from the mathematical model: 

y/( x) = ^,e f A) + s^,e f ,o h) + ^±^ o+ e 

= m {*,6 f A) + 5 c {*,e f ,e h ) + 5 f {x) + e, (13) 

where e ~ iV(0,0.5 2 ). 

In this set-up, the true value of the common calibration parameter is chosen to be Of = 0.2, while 
the calibration parameter appearing only in the high and low fidelity models are chosen to be Oh — 0.3 
and 0{ — 0.1, respectively. 
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Figure [T] displays the response surfaces for the two computer models and also the mean response 
surface for the field process. A quick glance at the figure reveals that the high fidelity model appears 
closer to the mean process than the low fidelity model. This represents the framework we are working 
within insofar as the high fidelity model is expected to be more like the true system than the low 
fidelity model. 



Low Fidelity Simulator, r, Hi 9 h Fidelit V Simulator, t, + 6 c 



Field, ri + 6 +6, 
' 1 c f 








o 



o 



Figure 1: From left to right: response surface of the low fidelity simulator, the high fidelity simulator 



and the mean of the physical process as outlined in equations (!!)-( 13). 



The posterior distribution of the model parameters was sampled using MCMC as outlined in 
Section 



2.2.1 



The MCMC chain is initialized with Of = 9^ = Q\ — 0.5 (i.e., the centre of the input 
space), X m = 1, A2 = Xf = X y = 20 and all the correlation parameters, p are chosen to be 0.1 as we 
assume that the simulator and discrepancy terms are dependent on all the inputs. Through visual 
inspection of the traceplots (not shown), we found that, for the data encountered in this example, 
convergence is achieved in the first 1,000 steps are so. The MCMC was run for 10,000 steps, where 
the first 2,000 steps are treated as burn-in and discarded in further analysis. 



In addition to the data simulated from (11) - (13) used to fit the proposed model (i.e., the training 



set), a validation dataset was generated from (13), so that the predictive performance can be evaluated. 



The validation set consisted of 25 field observations with input settings, x, chosen using random Latin 
hypercube sampling. We use the posterior mean prediction at x to estimate i/(x). Figure [2] shows the 
predicted versus actual values for each of the validation points. The figure shows that the predictive 
model performs reasonably well since the points center around the y — x line. 

Figure [3] displays the deviations of the predictions from the true values plotted against the predic- 
tions and also the input settings in each dimension. In each case, no obvious pattern is found in the 
plots, suggesting the outputs have similar degree of smoothness across the input space and that no 
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Figure 2: Predicted versus actual field measurements of the validation set (with the y = x line) 
obvious systematic behaviour was unaccounted for. 
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Figure 3: Diagnostics plots for the simple example: (a) Prediction error against predictions; (b) 
Prediction error against x\\ (c) Prediction error against X2- 

While not the specific goal of the proposed methodology, we now consider the estimation of the 
calibration parameters. Figure [4] shows the estimated one-dimensional and two-dimensional marginal 
posterior distributions of the calibration parameters. Solid vertical lines are plotted at the true values of 
the calibration parameters. In general, these posterior distributions can be interpreted as representing 
the uncertainty in the calibration parameters given the very limited number of observations and small 
numbers of simulations from imperfect computer models. A quick glance at the plots reveals that, 
except for the calibration parameters are not being constrained by the data. It is not too surprising 
that we can constrain but not the calibration other parameters, since there are more outputs 
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1 ■ 

0.5 
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(comparisons between the low and high fidelity models) to inform this parameter. The inability to 
constrain the other calibration parameters is due to the presence of the discrepancy terms £/(•), and 
the dearth of data. 



Two-dimensional marginals for the posterior distribution of the three parameters 
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1 



Figure 4: The diagonals show the marginal posterior distributions of the calibration parameters, with 
the true values marked with vertical lines. The off-diagonals sub-plots contain the two-dimensional 
marginal posterior distributions for the three calibration parameters. The solid lines represent the 
95% high posterior density region. 



Figure [5] shows the estimated posterior distributions of the calibration parameters for different 
sample sizes. Panels (a), (b) and (c) are the results of the simulations with (a) n\ — — 20, rif = 3, 
(b) n\ — rih — rif = 40 and (c) n\ — — rif = 100. The first case was chosen as a more simulation 
rich version of the above example. Comparing panel (a) of Figure [5] with the results in Figure [4j we 
see that the mode of the posterior distribution of 9\ is closer to the true value (solid line) and there is 
less variability in the posterior distribution when there are more simulations. However, very little is 
learned about the calibration parameters 9^ and Of. To gain more information on these parameters, 
there needs to be more field observations. Panels (b) and (c) consider cases where the number of 
simulations and field trials is larger than before. As the number of observations and simulations 
increases, the model is able to better estimate the calibration parameters. An interesting observation 
is that the shared calibration parameter 0/ is better constrained in panel (b) than 9^. The reason for 
this, we surmise, is that given the same number of field trials both the low and high fidelity models 
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Two-dimensional marginals for the posterior distribution of the three parameters 



Two-dimensional marginals for the posterior distribution of the three parameters 
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(a) m =n h = 20, rif = 3 



(b) m = n h = rif = 40 



Two-dimensional marginals for the posterior distribution of the three parameters 
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(c) m = n h = rif = 100 

Figure 5: The diagonals show the marginal posterior distributions of the calibration parameters, with 
the true values marked with vertical lines. The off-diagonals sub-plots contain the two-dimensional 
marginal posterior distributions for the three calibration parameters. The solid lines represent the 
95% high posterior density region. 



help inform Of, but only the high fidelity model directly informs 9^. When there are relatively many 
simulations and observations, all of the calibration parameters tend to be well constrained (panel (c)). 

A subsequent simulation study is performed to compare predictions of the new model with ap- 
proaches that only use some of the simulations. Models Dl and D2 are implementations of the 
Kennedy and O'Hagan (2001) approach using only the low fidelity and only the high fidelity outputs, 
respectively. Predictions from these models are compared with those from model D3, the proposed 
methodology. In other words, we are investigating whether the proposed approach of combining all 
simulations and observations is better in some sense than the Kennedy and O'Hagan (2001) method 
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using one of either the low fidelity model or high fidelity model outputs alone. 

The simulation study is carried out as follows. Using random Latin hypercube sampling, 100 sets 
of training and validation data are first generated independently. Each training set contains the same 
number of outputs as the above: 40 simulated values from the low fidelity simulator, 10 computer 
runs from the high fidelity simulator and 3 field observations. For each simulated training set, models 
Dl, D2 and D3 are estimated, and predictions of the validation set are obtained from each model. 
The predictions are evaluated by computing the root mean squared prediction errors (RMSPE) for 
the validation data. This is done for each of the 100 simulated training and validation datasets. The 
simulation study results are summarized in Figure [6j 

Figure [6] reveals that the RMSPE from the proposed model is consistently smaller than RMSPE 
of the other two models. Interestingly, in panel (a), we notice that the RMSPE is larger for the high 
fidelity model than the low fidelity model. This is the result of having relatively few runs of the high 
fidelity code. Looking at Figure [6(b)" , when n\ — — 20 and rif = 3, prediction using the higher 
fidelity outputs does better than prediction using only the low fidelity outputs. In either case, the 
proposed approach that uses all sources of data tends to do better in terms of RMSPE. 





(a) ni — 40, — 10 and rif = 3 



(b) rii — rih — 20 and rif = 3 



Figure 6: Boxplots of the RMSPE obtained from the 100 simulated datasets analyzed using models 
D1-D3. 



In general, we found that the proposed model that makes use of all the simulations works well in 
making predictions for the physical system. The simulation demonstrates that more efficient estimation 
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is gained through this approach. Although calibration is not the priority, we come across a similar issue 
encountered by Kennedy and O'Hagan (2001) - calibration is difficult with limited amounts of data. 
However, as the number of outputs increases, more information is available to calibrate the parameters 
of interest. In the case of calibration, it is important to note what is being achieved. That is, the 
posterior distributions reflect the uncertainty in the calibration parameters given the observations and 
the imperfect computer models. 

3.2 CRASH Application 

The application that motivated the proposed methodology arises from radiative shock experiments at 
CRASH. Figure [7] gives a diagram of the system that we want to predict. In the physical experiments, 
a high energy laser pulse irradiates a thin disk of beryllium at the front end of a xenon filled tube. 
The energy deposited in the surface causes the beryllium to ablate. A shock wave is then driven by 
the ablation pressure through the beryllium disk. After the shock wave breaks out of the beryllium 
disk, the disk acts as a piston, propagating the shock at a high speed into the xenon. When the 
xenon is shocked, it is heated to temperatures well over 100,000 °K and emits thermal x-ray radiation. 
These shocks are considered radiative when the radiation energy flux from the shock is high enough 
to impact the structure of the shock wave. Details regarding the radiative shock physics can be found 
in Drake et al. (2011). The radiating shock experiments that we are concerned with can be viewed 
as small-scale experiments for understanding astrophysical shock waves and other high temperature 
phenomena (McClarren et al., 2011; Drake et al., 2011). Several measurements of interest are taken 
from each shock experiment and also simulations. We focus here on the time taken for the shock wave 
to exit the beryllium disk (breakout time). 

Using two different radiation- hydrodynamics codes (ID-CRASH and 2D-CRASH), we aim to pre- 
dict the shock breakout time. The 2D-CRASH code includes two-dimensional processes and interac- 
tions that the one-dimensional code, or ID-CRASH, does not. As a result, the 2D-CRASH model is 
assumed to be able to model the experiments better than the ID-CRASH code, but it is also more 
computationally expensive. 

The design variables for this experiment are the thickness of the beryllium disk (xi) and laser 
energy (^2). The electron flux limiter is calibration input to both computer models and is denoted as 
tf. The laser energy scale factor is an additional calibration parameter, £/, required by the ID-CRASH 
code but not the 2D-CRASH simulator. The high fidelity computer code has two calibration inputs 
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Beryllium Disk 




Figure 7: A pictorial version of the apparatus used in the radiative shock experiments. The black 
vertical bar represents the beryllium disk where the laser (in blue) deposits energy. The shock wave 
breaks through the beryllium disk and moves down the xenon filled tube (horizontal bar). 

- beryllium Gamma (t^i) and wall opacity scale factor (£^2). All the inputs are scaled to the unit 
interval before fitting the data to the proposed model. 

We have 365 simulations from ID-CRASH and 104 2D-CRASH runs available. The designs for each 
computer experiment were Latin hypercube designs, optimized using a space- filling criterion (Johnson 
et al., 1990). There are also 8 experiments that were conducted at the OMEGA Laser Facility at the 
University of Rochester where the breakout time was recorded (Boehly et al., 1997). 

The MCMC was set up as in the previous examples, with one exception. From previous usage 
of the laser, it was known that the variance of the observation error was about 50 x 10 -12 seconds 

- or approximately 1 after standardizing. A Gamma distribution with shape and scale parameter 
(10,000, 10,000) was chosen for the prior of X y . This is an informative prior that tightly centers the 
Gamma distribution at 1. The widths for the Metropolis updates are chosen as outlined in Section 



2.2. 1| We found that convergence was achieved shortly after 1,000 MCMC steps. So, the MCMC was 
run for 10,000 steps and the first 2,000 were discarded as burn-in. 

Similar to the previous example, the deviations of the predictions from the observed breakout times 
are plotted against the predictions and the two input settings (diagnostic plots not shown). No obvious 
pattern is found in any of the diagnostic plots, thereby suggesting that the model fit is adequate. 

A leave-one-out study is conducted to evaluate the predictive ability of the new approach. That is, 
we delete an observation, fit the proposed model and predict the deleted observation. This is done for 
each of the 8 observations. Figure 15] is a plot of the resulting predictions against the observed breakout 
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time. The 95% posterior prediction interval for each point is shown in the figure. The predictions are 
fairly close to the observed values and, thus most points are near to the y = x line. However, the 
second observation from the left gives a prediction interval that fails to capture the observation. 




Figure 8: Predicted versus actual breakout times and 95% prediction intervals. 



Two-dimensional marginals for the posterior distribution of the four parameters 




0.03 0.1 
Electron Flux Limiter 



1.4 1.750.7 1.3 0.2 0.54 

Be Gamma Wall Opacity Energy 

Scale Factor Scale Factor 



Figure 9: The diagonals show the marginal posterior distributions of the calibration parameters. 
The off-diagonals sub-plots contain the two-dimensional marginal posterior distributions for the four 
calibration parameters. The solid lines represent the 95% high posterior density region. 
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Plots of the marginal posterior distributions of the calibration parameters are shown in Figure 
[9| The posterior distributions for all the calibration parameters, except the energy scale factor, are 
not constrained in this application. However, the posterior distributions have clear modes to suggest 
plausible values for the calibration parameters. 

4 Discussion 

So far, we have focused on the setting where there are only two computer models. The new method- 
ology, however, can easily be extended to model applications that involve more than two simulators. 
Here, such extensions, as well as limitations of this model, are mentioned. 

Suppose that there are H simulators denoted as rjk (•) for k = 1, . . . , if, where %(•) is the next 
highest level of fidelity model from 77^-1 (•). The simulators share the same design variables, x, and 
some common calibration parameters, if. The additional calibration parameters required by each of 
the respective computer model are denoted as tfc, for k = 1, . . . , K . The outputs of the lowest fidelity 
computer model are denoted as: 

Yl (x,t/,ti) = 7/1 (x,t/,ti) . 

The outputs from the higher fidelity simulators can then be written as a combination of the lowest 
fidelity simulator and discrepancy functions that capture the systematic differences between each pair 
of simulators. For k = 2, . . . , if, the simulated outputs are written: 

Y k (x,tf,t h ) = 77fc(x,t/,tfc) 

k-i 

= Vl ( x > t/, 61) + ^ S 3 ( X > */> °j) + S k ( x > */ 3 *fc) • 

Field measurements are also available to build the predictive model. The experimental observations 
are represented with the highest fidelity simulator and are written as the sum of the low fidelity 
simulator and discrepancy functions: 

Y/(x) = r/x(x,0 / ,0 K )+5 / (x) + e 

K 

= m (x, e f , 01) + 6 j ( x ' e f> + s f ( x ) + e > 

J=2 
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where Sf (x) measures the discrepancy between the highest fidelity computer model and physical 
process. The response surfaces of the different sources of data are modelled with GPs with mean and 
covariance functions discussed in Section [2^21 



Some care should be taken in the prior specification for the precision parameters for the GPs. We 
have found that the default choices of prior distributions outlined in Section [2?2] work fine in most cases 



(e.g., the simulations in Section 3.1). However, for some datasets, extremely large values of X y are 
observed. This amounts to essentially a model with no measurement error and discrepancies that are 
interpolating the noise. We noticed the phenomenon when the default priors are used for the CRASH 
example. This can also happen with the model proposed by Kennedy and O'Hagan (2001). In our case, 
we avoided this problem because we had a more informative prior distribution for X y . Alternatively, 
one can address this issue by rejecting small values of a precision parameter in the MCMC (this was 
done in Higdon et al. (2004)), or at the design stage by taking replicate field observations. 

A further note of caution with respect to the experimental design. The design regions for the 
computer experiments should coincide to avoid uncertainty due to extrapolation in the discrepancies 
between models. Suppose for example, the design for tf in the low fidelity simulator explores a much 
larger region than the design for the high fidelity model. When predictions are made, the proposed 
approach averages over the posterior distribution of the calibration parameters. For values of Of from 
the posterior that are outside of the range explored by the design of the high fidelity model, the 
proposed approach extrapolates ^(O- This results in larger prediction intervals. 

Lastly, we do not address problems with design variables that appear in only some, but not all, 
simulators. This is a topic for future work. 

5 Conclusion 

A new methodology, which combines outputs from multi-fidelity computer models and field observa- 
tions, is proposed. The approach successfully uses a Bayesian hierarchical model to make predictions 
of the physical system with associated measurements of uncertainty (e.g., posterior variance or pre- 
diction intervals). Different GPs are used to model the various response surfaces. The real example 
that motivated this work used two simulators of the process, but methodology can be easily extended 
to cases with more than two simulators. 
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